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1  Introduction 


Consider  the  following  two  statements  about  the  game  of  diess. 

A  king  on  an  empty  board  can  readi  every  square. 

*  A  knight  on  an  empty  board  can  reach  every  square. 

The  first  statement  is  clearly  true.  The  second  statement,  while  true,  is  not 
obvious.  There  is  an  analogy  between  the  notion  of  an  “obvious”  statement  and 
the  notion  of  a  grammatical  sentence.  By  analogy  with  linguistic  practice,  an 
asterisk  has  been  written  in  front  of  the  second  statement  to  indicate  that  it  is  not 
obvious. 

The  classification  of  a  given  statement  as  being  either  obvious  or  non-obvious 
wiU  be  called  a  cognitive  judgement.  In  this  paper  we  investigate  the  possibility  of 
constructing  analytic  theories  of  cognitive  judgments  analogous  to  analytic  theo¬ 
ries  of  grammar  —  theories  that  predict  which  statements  are  obvious  and  which 
statements  are  not.  One  particular  theory  of  a  class  of  cognitive  judgments,  so 
called  “inductive”  judgments,  is  given  in  this  paper. 

Before  considering  a  particular  analytic  theory  of  cognitive  judgments,  it  useful  to 
consider  some  further  examples. 

Consider  a  graph  with  colored  nodes  such  that  every  arc  connects  two  nodes 
of  the  same  color.  Any  two  nodes  connected  by  a  path  of  arcs  are  the  same 
color. 

*  Any  graph  with  five  nodes  and  five  edges  contains  a  cycle. 

A  five  inch  by  six  inch  rectan^e  can  be  divided  into  squares  where  each 
square  is  one  inch  on  a  side. 

*  A  five  inch  by  six  inch  rectan^e  can  be  divided  into  squares  where  each 
square  is  larger  than  one  inch  on  a  side. 

Intuitively,  a  statement  is  obvious  if  it  is  immediate  —  one  judges  it  to  be  true 
without  experiencing  intervening  thoughts.  The  unstarred  examples  given  above 
are  obvious  in  this  sense.  If  a  statement  is  not  immediately  true,  but  can  be  seen 
to  be  true  by  considering  some  number  of  cases,  examples,  or  other  statements, 
then  the  statement  is  not  obvioiu.  The  starred  examples  are  not  obvious. 

Obviousness  can  not  be  equated  with  truth  —  many  nonobvious  statements 
are  true,  such  as  the  starred  statements  above,  and  some  obvious  statements  are 
not  true.  As  an  example  of  an  obvious  statement  that  is  not  true  consider  the 
statement  “in  a  finite  interval  of  time,  a  bouncing  ball  can  only  botmce  a  finite 
number  of  times.”  One  can  at  least  argue  that  this  statement  is  false.  This 
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statement  requires  the  additional  assumption  that  there  is  a  lower  limit  on  the 
time  taken  by  an  individual  bounce.  In  fact,  to  a  close  approximation,  the  time 
taken  by  successive  bounces  of  a  boimcing  ball  decreases  geometrically  with  each 
bounce.  This  approximation  predicts  an  infinite  number  of  bounces  in  a  finite 
amount  of  time.  The  mathematical  model  of  geometrically  decreasing  bounce  time 
is  self-consistent  and  provides  a  counterexample  to  the  statement.  Fortunately,  it 
seems  possible  to  construct  a  predictive  theory  of  cognitive  judgments  independent 
of  the  semantic  notion  of  truth. 

Linguistic  theories  of  granunaticality  are  gtntrativt  —  an  infinite  set  of  gram¬ 
matical  sentences  is  generated  by  a  finite  grammar.  Theories  of  cognitive  judg¬ 
ments  can  also  be  generative.  A  sequent  is  an  expression  of  the  form  E  h  $  where 
E  is  a  set  of  formulas  (premises)  and  $  is  a  formula  that  may  or  may  not  be 
derivable  &om  E.  A  cognitive  judgement  can  be  formalized  as  a  sequent  plus  a 
specification  of  whether  that  sequent  is  obvious  or  nonobvious.  We  say  that  a  set 
of  inference  rules  generates  the  sequent  E  K  $  if  $  can  be  derived  from  E  using 
those  rules.  In  order  for  a  rule  set  to  be  a  good  predictor  of  cognitive  judgments 
it  should  generate  all,  and  only,  the  obvious  sequents. 

The  theory  of  cognitive  judgments  presented  here  is  linguistic  —  it  is  based  on 
a  particular  knowledge  representation  language,  a  knowledge  base,  and  inference 
rules.  Of  course,  one  can  imagine  non-linguistic  theories  of  obviousness  —  for 
example,  a  theory  based  on  “visual”  processing.  It  remains  to  be  seen  whether  one 
can  find  image-processing  theories  of  cognitive  judgments  with  the  same  predictive 
power  as  linguistic  theories. 

The  remainder  of  this  paper  can  be  divided  into  two  parts.  The  first  discusses 
local  rule  sets  and  their  role  in  theories  of  cognitive  judgments.  The  general  concept 
of  a  local  rule  set  was  introduced  m  [McAllester,  1990].  The  second  part  of  the 
paper  discusses  a  class  of  “inductive”  cc^nitive  judgments.  An  inductive  cognitive 
judgement  consists  of  an  obvious  sequent  where  a  formal  (syntactic)  derivation  of 
the  sequent  appears  to  require  reasoning  by  mathematical  induction.  Although  no 
local  rule  set  has  been  foimd  that  incorporates  a  rule  for  mathematical  induction, 
aspects  of  the  theory  of  local  rule  sets  can  be  used  to  construct  a  formal  theory  of 
inductive  cognitive  judgments. 


2  Local  Inference  Rules 


In  linguistic  theories  of  syntax  it  tisually  easy  to  determine  whether  or  not  a  given 
string  of  words  can  be  generated  by  a  given  grammar.  For  example,  given  any 
particular  context-free  grammar  one  can  determine  whether  or  not  a  given  string 
is  generated  by  that  grammar  in  n*  time  where  n  is  the  length  of  the  string.  Most 
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well  known  sets  of  inference  rules  are  different  from  grammars  in  the  sense  that 
it  is  difficult  to  determine  if  a  given  sequent  is  generated  by  the  inference  rules. 
Inference,  unlike  parsing,  tends  to  be  computationally  intractable. 

The  apparent  computational  intractability  of  inference  is  a  problem  in  the  the- 
ory  of  cognitive  judgments  for  two  reasons.  First,  one  can  argue  that  it  is  psycho¬ 
logically  implausible  that  the  human  set  of  obvious  statements  is  computationally 
intractable.  Second,  and  perhaps  more  significantly,  a  computationally  intractable 
theory  is  difficult  to  test  against  observed  data.  Given  a  theory  of  cognitive  judg¬ 
ments,  and  a  sequent  that  is  judged  to  be  non-obvious,  one  must  show  that  the 
theory  does  not  generate  the  sequent.  For  a  complex  rule  set  this  can  be  difficult. 

Fortunately,  there  is  a  class  of  sets  of  inference  rules,  the  local  rule  sets,  that  are 
analogous  to  context  free  grammars  —  for  a  given  local  rule  set  one  can  determine, 
in  polynomial  time  in  the  size  of  a  statement,  whether  or  not  that  statement  is 
generated  by  the  rule  set.  Let  A  be  a  set  of  inference  rules.  The  following  definitions 
and  lemma  are  from  [McAUester,  1990]. 

Definition:  We  write  S  h  ^  $  if  there  exists  a  proof  of  $  from  the 
premise  set  £  such  that  every  proper  subexpression  of  a  formula  used 
in  the  proof  appears  as  a  proper  subexpression  of  $,  a  proper  subex¬ 
pression  of  some  formula  in  E,  or  as  a  closed  (variable  free)  expression 
in  the  rule  set  R. 

Lemma:  For  any  fixed  rule  set  R,  there  exists  a  procedure  for  deter¬ 
mining  whether  or  not  £  h  $  which  runs  in  time  polynomial  in  the 
written  length  of  £  and  $. 

We  write  £  (~a$  if  there  is  exists  any  proof  of  9  from  £  using  the  inference 
rules  in  R.  The  inference  relation  h  a  is  a  restricted  version  of  I~a.  For  any  rule  set 
i2,  the  relation  H  a  is  polynomial  time  decidable.  If  the  relation  Ha  is  intractable, 
as  is  the  case  for  any  sound  and  complete  set  of  rules  for  first  order  logic,  then  the 
pol3momial  time  relation  H  a  wiU  be  weaker  than  the  relation  Ha. 

Definition:  The  rule  set  iZ  is  called  local  if  the  relation  H  a  is  the  same 
as  the  relation  Ha. 

An  immediate  consequence  of  the  above  definitions  and  lemma  is  that  local 
rule  sets  are  tractable,  i.e.,  they  generate  polynomial  time  decidable  inference 
relations.  A  variety  of  nontrivial  local  rule  sets  is  presented  in  [McAUester,  1990). 
An  application  of  local  rule  sets  to  Schubert’s  steamroUer  is  described  in  [Givan  et 
al.,  1991). 


3 


3  Inductive  Cognitive  Judgments 


There  b  a  class  of  cognitive  judgments,  that  I  will  call  inductive  judgments,  which 
appear  to  be  most  simply  analyzed  by  hypothesizing  inference  rules  for  mathemat¬ 
ical  induction.  Consider  the  following  examples,  some  of  which  are  given  above. 

By  walking  north,  a  person  can  never  get  south  of  where  they  started. 

If  a  maze  containing  a  rat  is  placed  in  a  sealed  box  then,  no  matter  where 
the  rat  runs  in  the  maze,  it  will  not  get  outside  of  the  box. 

Consider  a  graph  with  cdored  nodes  such  that  every  arc  connects  two  nodes 
of  the  same  color.  Any  two  nodes  connected  by  a  path  of  arcs  are  the  same 
color. 

A  scrambled  Ruble’s  cube  is  scdvable,  i.e.,  there  exbts  a  sequence  of  moves 
that  will  unscramble  the  cube. 

Given  a  bag  of  marbles,  if  marbles  are  removed  one  at  a  time,  eventually 
the  bag  will  be  empty. 

A  king,  on  an  empty  chess  board,  can  reach  every  square. 

To  construct  a  set  of  inference  rules  that  generates  each  of  the  above  obvious 
statements,  one  must  ask  how  these  statements  noight  be  syntactically  derived. 
The  first  three  judgments  can  be  seen  as  spedal  cases  of  the  following  general 
principle. 

For  any  action  A  and  property  P,if  P  is  true  in  the  initial  state,  and,  in 
any  state  where  P  b  true,  P  remains  true  after  performing  action  A,  then 
P  will  be  true  in  any  state  resulting  from  any  number  of  applications  of  A 
to  the  initial  state. 

This  general  prindple  can  be  tised  to  account  for  the  walking  north  example  if 
we  assume  that  "not  south  of  the  initial  position”  b  a  property  and  "walk  north” 
is  an  action.  In  the  rat  and  maze  example,  "in  the  box”  is  a  property  preserved  by 
"moving  in  the  maze”.  In  the  colored  graph  example,  "being  the  initial  color”  is  a 
property  that  is  preserved  by  "crossing  an  arc  in  the  graph”.  The  general  principle, 
as  stated  above,  b  virtually  isomorphic  to  the  statement  of  the  induction  prindple 
for  natural  numbers.  Although  the  last  three  judgments  do  not  appear  to  be  direct 
applications  of  the  above  general  prindple,  they  all  correspond  to  statements  whose 
formal  derivation  appears  to  involve  mathematical  induction.  The  Rubic’s  cube 
stateomnt  can  be  proved  by  induction  on  the  number  of  moves  used  to  scramble  the 
cube.  The  bag-of-marbles  statement  can  be  proved  by  induction  on  the  nuniber 
of  marbles  in  the  bag.  The  king-on-a-chess-board  statement  can  be  proved  by 
induction  on  the  dbtance  between  the  king  and  a  target  square. 
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4  Polynomial  Time  Inductive  Inference 

This  section  gives  a  rule  set  that  includes  an  inference  rule  for  matheniatical  in¬ 
duction  and  that  can  be  used  to  provide  at  least  a  partial  analysis  of  each  of  the 
obvious  statements  given  in  the  previous  section.  Although  the  rule  set  is  not  local, 
the  theoretical  framework  of  local  inference  relations  can  be  used  to  construct  a 
polynomial  time  inference  procedure  based  on  this  nonlocal  rule  set. 

The  inference  niles  are  stated  in  a  particular  knowledge  representation  lan¬ 
guage.  Although  a  denotational  semantics  is  not  required  for  a  formal  theory  of 
cognitive  judgments,  the  inference  rules  are  much  easier  to  *Smderstand”  and  re¬ 
member  if  such  a  semantics  is  provided.  The  knowledge  representation  language 
given  here  has  been  designed  to  be  the  simplest  possible  language  in  which  an 
induction  rule  can  be  incorporated  into  a  local  rule  set.  The  language  contains 
a  Kleene  star  operation  to  express  an  indeterminate  number  of  iterations  of  an 
operation.  The  induction  rule  is  similar  to  the  induction  rule  of  propositional  dy¬ 
namic  logic  [Pratt,  1976],  [Hard,  1984],  [Kozen  and  Tiuryn,  1990).  The  language 
described  here  is  also  dosdy  rdated  to  the  knowledge  representation  language 
described  in  [McAllester  et  ai.y  1989]. 

The  classical  syntax  for  first  order  lo^c  involves  two  grammatical  cat^ories 
—  formtilas  and  terms.  The  knowledge  representation  language  described  here 
also  involves  two  syntactic  categories  —  formulas  and  class  expressions.  Formulas 
denote  truth  values  and  class  expressions  denote  sets. 

•  A  clan  expnnion  is  one  of  the  following. 

—  A  class  symbol. 

-  An  expression  of  the  form  (J2  O  where  A  is  a  binary  relation  symbol  and 
C  is  a  class  expression. 

-  An  expression  of  the  form  (.R*  C)  where  £  is  a  binary  relation  symbol  and 
C  is  a  class  expresdon. 

•  A  formula  is  an  expresdon  of  the  form  (every  C  W)  where  C  and  W  axe  class 
expresdons. 

A  semantic  model  of  the  language  defined  above  consists  of  an  assignment  of  a 
set  to  every  class  symbol  and  an  assignment  of  a  binary  rdation  (a  set  of  pairs)  to 
every  binary  rdation  symbol.  If  Af  is  a  semantic  modd  then  we  write  V(e,  A4)  for 
the  semantic  value  of  the  expression  e  in  the  modd  M.  If  C  is  a  class  expression 
then  V(C,M)  is  a  set  and,  if  $  is  a  formula,  V{9,M)  is  a  truth  value,  dther  T 
or  F.  The  semantic  value  function  V  is  defined  as  follows. 
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•  If  P  is  a  class  symbol  then  V(P,  M)  is  the  set  that  is  the  interpretation  of  P  in 
M. 

•  V((P  OfM)  is  the  set  of  all  such  that  there  exists  an  element  df  in  V(C,M) 
snch  that  that  the  pair  <d,  d!>  is  an  dement  of  the  rdation  denoted  by  R. 

•  V((Ji*  C),M)  is  the  union  of  V(C,  Af),  V((P  C),M),  V{iR  (P  C)),A<), 
V((P  (P  (P  cm,M) .... 

•  V((every  C  W),  M)  is  T  if  V(C,  M)  is  a  subset  of  V{W,  M). 

As  an  example,  suppose  that  a-red-node  is  a  class  symbol  that  denotes  the 
set  of  all  the  red  nodes  in  some  particular  graph.  Suppose  that  a-neighbor-of 
denotes  the  binary  relation  that  contmns  the  pair  «2,  (f>  just  in  case  d  and 
are  nodes  of  the  graph  and  there  is  an  arc  between  d  and  (f.  In  this  case  the  dass 
(a-neighbor-of  a-red-node)  denotes  the  set  of  all  nodes  that  are  one  arc  away 
from  a  red  node.  The  class  (a-neighbor-of*  a-red-node)  is  the  set  of  all  nodes 
that  can  be  reached  by  crossing  zero  or  more  arcs  from  a  red  node. 

Figure  1  gives  a  sound  set  of  inference  rules  for  the  above  knowledge  represen¬ 
tation  language.^  For  ease  of  exposition,  let  (P”  C)  abbreviate  (P  (P  . . .  (P 
C)))  with  n  occurrences  of  P.  The  expression  (P*  C)  denotes  the  union  over  aU 
n  >  0  of  (P"  C) .  Inference  rules  5  and  6,  together  with  rules  2  and  3,  ensure  that 
for  any  n  >  0  we  have  (avary  (P“  C)  (P*  O).  Inference  rule  7  is  an  induction 
rule.  Consider  the  statement  that  if  every  neighbor  of  a  red  node  is  red,  then  every 
node  connected  by  some  path  of  arcs  to  a  red  node  is  also  red.  This  statement 
contains  a  premise  equivalent  to  the  formula 

(avazy  (a-naighbor-of  a-rad-noda)  a-rad-noda). 

Inference  rule  7  allows  us  to  immediately  conclude  the  formula 

(avazy  (a-naighbor-of*  a-rad-noda)  a-rad-noda). 

Let  I  (for  Induction)  be  the  rule  set  ^ven  in  figiue  1.  Recall  that  the  inference 
relation  h  /  is  a  restricted  version  of  the  inference  relation  h  /.  The  restricted 
inference  relation  H  /  is  polynomial  time  decidable.  By  definition,  the  rule  set  7  is 
local  if  and  only  if  these  two  relations  are  the  same.  Unfortunately,  the  rule  set  I 
is  not  local.  In  particular  we  have 

{(avary  A  P),  (avazy  (P  P)  P)}  h/  (avazy  (P*  A)  P) 

^Theae  rales  are  i4>paKnily  not  semantically  complete.  The  formula  (avazy  (P*  A)  P) 
semantically  follows  from  (avazy  A  P),  (avary  A  C),  (avary  (P  P)  C),  and 
(avary  (P  C)  P).  However,  them  appean  to  be  no  proof  using  the  above  infennce  rules.  A 
proof  could  be  constructed,  however,  if  we  allow  intersection  class  expressions  with  ^>propriate 
mference  rules  for  intersection.  In  that  case  we  could  show  that  P  preserves  the  intersection  ai 
B  and  C. 
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(1) 

(svsry  C  C) 

(4) 

(svsry  C  W) 

(2) 

(svsry  C  W) 

(svsry  W  Z) 

(svsry  (B*  C)  (B*  W)) 

(5) 

(svsry  (7  (B*  O) 

(svsry  C  Z) 

(6) 

(svsry  (B  (B*  O)  (B*  O) 

(3) 

(svsry  C  W'i 

(svsry  (B  C)  (B  W)) 

(7) 

(svsry  (B  C)  C) 

(svsry  (B*  C)  O 

Figure  1:  Some  inference  rules 

but 

{(•▼•ry  A  B),  (every  {R  B)  B)}  /  Cvrvcj  (R*  A)  B). 

The  problem  is  that  the  proof  underlying  the  first  sequent  involves  the  formula 
(every  (R*  B)  B)  (this  is  derived  from  inference  nile  7  and  the  desired  result 
can  then  be  derived  from  inference  rules  4  and  2).  Unfortunately,  the  formula 
(every  (B*  B)  B)  is  not  local  —  the  class  expression  (jR*  B)  does  not  appear  in 
the  desired  sequent.  Since  only  local  formulas  are  allowed  in  proofs  underling  the 
inference  relation  H  /,  this  proof  can  not  be  used  to  generate  the  second  sequent 
listed  above. 

A  second  set  of  inference  rules  is  given  in  figure  2.  Let  I'  be  the  rule  set  given  in 
figure  2.  For  sequents  that  do  not  involve  formulas  of  the  form  (preserves  R  C) , 
the  inference  relation  l-/>  is  equivalent  to  the  inference  relation  h/.  However,  the 
restricted  relation  h  /«  is  considerably  more  powerful  than  the  restricted  relation 
H  /.  For  example,  we  have 

{(svsry  A  B),  (every  (B  B)  B)}  H/*  (every  (B*  A)  B). 

As  with  all  locally  restricted  inference  mlations,  the  inference  relation  H  />  is  poly¬ 
nomial  time  decidable.  We  can  take  H  />  as  a  generative  theory  of  obvious  induc¬ 
tive  sequents,  although  any  theory  with  reasonable  coverage  of  the  actual  obvious 
sequents  would  require  a  richer  knowledge  representation  language  and  more  in¬ 
ference  rules. 

Unfortunately,  the  expanded  rule  set  V  is  still  not  local  —  there  are  sequents 
generated  by  l-/>  that  are  not  generated  by  H  />.  However,  these  examples  are 
difficult  to  find  and  seem  to  have  little,  if  any,  significance  in  practice.  It  is  not 
known  whether  the  rule  set  /'  can  be  further  expanded  to  a  truely  local  rule  set. 
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(1)  (svary  C  C) 


(2)  (avaxT  C  W) 

(evary  W  Z) 

(avary  C  Z) 

(3)  (avary  C  W) 

(avary  (.R  C)  CR  W)) 

(4)  (avary  C  W) 

(avary  (.RT  C)  (.R*  W)) 
(6)  (avary  C  CR*  O) 


(6)  (avary  C  W) 

(avary  (.R  W)  O 


(prasarvas  R  C) 

(7)  (avary  C  W> 

(prasarvas  R  W) 


(avary  CR  C)  W) 

(8)  (avary  C  W) 
(avary  W  C) 
(prasarvas  A  C) 


(prasarvas 

(9)  (prasarvas  £  (J2*  O) 

(10)  (avary  C  W) 
(prasarvas  JS 


(avary  (iT  C) 


Figiire  2:  An  equivalent,  more  nearly  local,  rule  set 

It  seems  likely  that  one  can  construct  large  local  rule  sets  that  include  rules  for 
mathematical  induction.  Such  rule  sets,  or  even  “nearly  local”  rule  sets  such  as 
that  given  in  figiire  2,  may  have  important  engineering  applications  in  areas  such 
as  automatic  program  verification. 


5  Conclusion 


Cognitive  judgments,  i.e.,  judgments  about  whether  a  given  statement  is  obvi¬ 
ously  true,  can  be  viewed  as  a  source  of  data  about  the  structure  of  human  cogni¬ 
tion.  Although  there  is  a  rich  source  of  fairly  unambiguous  cognitive  judgments, 
it  appears  impossible  to  gain  direct  introspective  access  to  the  imderlying  com¬ 
putational  mechanisms.  On  the  other  hand,  it  does  appear  possible  to  construct 
generative  analytic  theories  of  these  judgments. 
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Analytic  theories  of  cognitive  judgments  can  be  viewed  as  being  analogous  to 
analytic  theories  of  grammaticality.  Local  rule  sets  are  analogous  to  context  free 
grammars  in  that  the  there  exists  a  procedure  for  determining,  in  polynomial  time 
in  the  size  of  a  sequent,  whether  or  not  the  rule  set  generates  that  sequent.  Local 
rule  sets  provide  a  formal  framework  for  the  construction  of  linguistic  theories  of 
cognitive  judgments. 

This  papa  is,  at  best,  only  a  first  step  in  the  construction  of  compelling  pre¬ 
dictive  theories  of  cognitive  judgments.  A  richer  language  is  clearly  needed  for 
expressing  generative  inference  rules.  A  theory  is  needed  of  the  translation  of  En¬ 
glish  sentences  into  formulas  of  the  internal  knowledge  representation  language. 
Ideally,  the  knowledge  r^resentation  language  used  to  express  cognitive  inference 
niles  should  be  the  same  as  the  language  used  to  express  the  logical  form  (semantic 
representation)  of  natural  language  statements.  This  would  allow  existing  theories 
of  logical  form  to  be  used  in  constructing  theories  of  cognitive  judgments.  It  re¬ 
mains  to  be  seen  whether  the  baisic  approach  outlined  here  can  lead  to  a  convincing 
integrated  theory  of  linguistic  logical  form  and  cognitive  judgement  data. 
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